196 research outputs found

    Un environnement générique et ouvert pour le traitement des expressions polylexicales

    Get PDF
    The treatment of multiword expressions (MWEs), like take off, bus stop and big deal, is a challenge for NLP applications. This kind of linguistic construction is not only arbitrary but also much more frequent than one would initially guess. This thesis investigates the behaviour of MWEs across different languages, domains and construction types, proposing and evaluating an integrated methodological framework for their acquisition. There have been many theoretical proposals to define, characterise and classify MWEs. We adopt generic definition stating that MWEs are word combinations which must be treated as a unit at some level of linguistic processing. They present a variable degree of institutionalisation, arbitrariness, heterogeneity and limited syntactic and semantic variability. There has been much research on automatic MWE acquisition in the recent decades, and the state of the art covers a large number of techniques and languages. Other tasks involving MWEs, namely disambiguation, interpretation, representation and applications, have received less emphasis in the field. The first main contribution of this thesis is the proposal of an original methodological framework for automatic MWE acquisition from monolingual corpora. This framework is generic, language independent, integrated and contains a freely available implementation, the mwetoolkit. It is composed of independent modules which may themselves use multiple techniques to solve a specific sub-task in MWE acquisition. The evaluation of MWE acquisition is modelled using four independent axes. We underline that the evaluation results depend on parameters of the acquisition context, e.g., nature and size of corpora, language and type of MWE, analysis depth, and existing resources. The second main contribution of this thesis is the application-oriented evaluation of our methodology proposal in two applications: computer-assisted lexicography and statistical machine translation. For the former, we evaluate the usefulness of automatic MWE acquisition with the mwetoolkit for creating three lexicons: Greek nominal expressions, Portuguese complex predicates and Portuguese sentiment expressions. For the latter, we test several integration strategies in order to improve the treatment given to English phrasal verbs when translated by a standard statistical MT system into Portuguese. Both applications can benefit from automatic MWE acquisition, as the expressions acquired automatically from corpora can both speed up and improve the quality of the results. The promising results of previous and ongoing experiments encourage further investigation about the optimal way to integrate MWE treatment into other applications. Thus, we conclude the thesis with an overview of the past, ongoing and future work

    The Impact of Word Representations on Sequential Neural MWE Identification

    Get PDF
    International audienceRecent initiatives such as the PARSEME shared task have allowed the rapid development of MWE identification systems. Many of those are based on recent NLP advances, using neural sequence models that take continuous word representations as input. We study two related questions in neural verbal MWE identification: (a) the use of lemmas and/or surface forms as input features, and (b) the use of word-based or character-based em-beddings to represent them. Our experiments on Basque, French, and Polish show that character-based representations yield systematically better results than word-based ones. In some cases, character-based representations of surface forms can be used as a proxy for lem-mas, depending on the morphological complexity of the language

    Towards Higher Quality Internal and Outside Multilingualization of Web Sites

    No full text
    International audienceThe multilingualization of Web sites with high quality is increasingly important, but is unsolvable in most situations where internal quality certification is needed, and not solved in the majority of other situations. We demonstrate it by analyzing a variety of techniques to make the underlying software easily localizable and to manage the translation of textual content in the classical internal mode, that is by modifying the language-dependent resources. A new idea is that volunteer final users should be able to contribute to the improvement oreven production of translated resources and content. For this, we have developed a PHP piece of code which naive webmasters (not computer scientists nor professional translators) can add to a Web site to enable internal multilingualization by users with enough access rights: in management mode, these users can edit the texts of titles, button labels, messages, etc. in text areas appearing in context in the Web page. If Web site developers follow some recommendations, all textual interface elements should be localizable in this way. Another angle of attack, applicable in all cases where navigating a site though a gateway is possible, consists in replacing the problem of diffusion by the problem of access in multiple lang uages. We introduce the concept of iMAG (interactive Multilingual Access Gateway, dedicated to a Web site or domain) to solve the problem of higher quality multilingual access. First, by using available MT systems or by default morphological processors and bilingual dictionaries, any page of an elected website is made instantly accessible in many languages, with a generally low quality profile, as through usual translation gateways. Over time, the quality profile of textual GUI elements, Web pages and even documents (if accessible in html) will improve thanks to outside contributors, who will post-edit or produce the translations from the reading context. This is only possible because the iMAG associated to the website stores the translations in its translation memory (TM) and the contributed dictionary items it its dictionary. The TM has quality levels, according to the users' profiles, and scores within levels. An API will be proposed so that the developers of the elected website can connect their to its iMAG, retrieve the best level translations, certify them if necessary, and put them in their localized resources. At that point, external localization meets internal localization

    Non-conventional vascular accesses for the management of superior vena cava syndrome in patients with Intestinal Failure: case series and systematic review

    Get PDF
    Background: Type III Intestinal Failure (IF) is a devastating clinical condition.characterized by the inability of the gut to absorb necessary macronutrients, and/or water and electrolytes, requiring Parenteral Nutrition (PN) as chronic therapy. Long-term PN may lead to life-threatening complications; the loss of central venous access (LCVA) is the most frequent and challenging. To date, few studies in the literature have reported the relevance of Non-conventional Vascular Accesses (NCVA) in the management IF as part of the comprehensive multidisciplinary care. Methods: A retrospective analysis of a database collected from January 2006 to December 2019 was performed using SPSS v25.0 for statistical analysis, followed by a systematic review, using the PRISMA.methodology Results: From January 2006 to December 2019, 184 NCVA were placed in 71 patients with LCVA as IF-related complication; 173 were placed in 61 patients by interventional radiology (IR) and 11 NCVA were placed in 10 patients by the surgical team during the intestinal transplant (ITx) operation. From the 173 IR procedures 166 (95.9%) were successful with 3 ± 2.7 procedures/patient; average catheter permanence rate was 738.68 ± 997 days; complications related to the procedures occurred in 18/173 (10.4%), including two deaths. On the other hand, among the 11 NCVA implanted by the surgical team, 7 (64%) were successful and were safely withdrawn 30 days after ITx when were no longer needed; 2 (18%) catheters malfunctioned during the first week and could not be further used, and 1 was accidently removed; average catheter permanence rate was 26 ± 4 days. There was one complication (9%) requiring laparotomy; there was no mortality associated the procedure in this group. A systematic review was conducted to evaluate the success and safety of NCVA as part of the treatment of HPN-related complications; from 337,542 papers, 14 studies were included. A total of 28 HPN-patients with LCVA received NCVA; 34 procedures were successfully performed, while procedure-related complications were reported in 11.7%, as well as one death. Conclusions: The data analyzed show that NCVAs may be successfully placed by expert teams, allowing to sustain long-term PN, as well as increasing the Intestinal Transplantation applicability for candidates in the extreme need of vascular access.Fil: Pérez Illidge, Luis Carlos. Universidad Favaloro; ArgentinaFil: Ramisch, Diego. Universidad Favaloro; ArgentinaFil: Valdivieso, León. Universidad Favaloro; ArgentinaFil: Guzman, Carlos. Universidad Favaloro; ArgentinaFil: Antoni, Diego. Universidad Favaloro; ArgentinaFil: Rumbo, Carolina. Universidad Favaloro; ArgentinaFil: Trentadue, Julio. Universidad Favaloro; ArgentinaFil: Solar, Héctor. Universidad Favaloro; ArgentinaFil: Gentilini, Maria Virginia. Consejo Nacional de Investigaciones Científicas y Técnicas. Oficina de Coordinación Administrativa Houssay. Instituto de Medicina Traslacional, Trasplante y Bioingeniería. Fundación Favaloro. Instituto de Medicina Traslacional, Trasplante y Bioingeniería; ArgentinaFil: Gondolesi, Gabriel Eduardo. Consejo Nacional de Investigaciones Científicas y Técnicas. Oficina de Coordinación Administrativa Houssay. Instituto de Medicina Traslacional, Trasplante y Bioingeniería. Fundación Favaloro. Instituto de Medicina Traslacional, Trasplante y Bioingeniería; Argentin

    The PARSEME Shared Task on Automatic Identification of Verbal Multiword Expressions

    Get PDF
    International audienceMultiword expressions (MWEs) are known as a "pain in the neck" for NLP due to their idiosyncratic behaviour. While some categories of MWEs have been addressed by many studies, verbal MWEs (VMWEs), such as to take a decision, to break one's heart or to turn off, have been rarely modelled. This is notably due to their syntactic variability, which hinders treating them as " words with spaces ". We describe an initiative meant to bring about substantial progress in understanding, modelling and processing VMWEs. It is a joint effort, carried out within a European research network, to elaborate universal terminologies and annotation guidelines for 18 languages. Its main outcome is a multilingual 5-million-word annotated corpus which underlies a shared task on automatic identification of VMWEs. This paper presents the corpus annotation methodology and outcome, the shared task organisation and the results of the participating systems

    Edition 1.2 of the PARSEME Shared Task on Semi-supervised Identification of Verbal Multiword Expressions

    Get PDF
    International audienceWe present edition 1.2 of the PARSEME shared task on identification of verbal multiword expressions (VMWEs). Lessons learned from previous editions indicate that VMWEs have low ambiguity, and that the major challenge lies in identifying test instances never seen in the training data. Therefore, this edition focuses on unseen VMWEs. We have split annotated corpora so that the test corpora contain around 300 unseen VMWEs, and we provide non-annotated raw corpora to be used by complementary discovery methods. We released annotated and raw corpora in 14 languages, and this semi-supervised challenge attracted 7 teams who submitted 9 system results. This paper describes the effort of corpus creation, the task design, and the results obtained by the participating systems, especially their performance on unseen expressions

    Representation and parsing of multiword expressions: Current trends

    Get PDF
    This book consists of contributions related to the definition, representation and parsing of MWEs. These reflect current trends in the representation and processing of MWEs. They cover various categories of MWEs such as verbal, adverbial and nominal MWEs, various linguistic frameworks (e.g. tree-based and unification-based grammars), various languages including English, French, Modern Greek, Hebrew, Norwegian), and various applications (namely MWE detection, parsing, automatic translation) using both symbolic and statistical approaches
    • …
    corecore